Boosting Classification Based Speech Separation Using Temporal Dynamics
نویسندگان
چکیده
Significant advances in speech separation have been made by formulating it as a classification problem, where the desired output is the ideal binary mask (IBM). Previous work does not explicitly model the correlation between neighboring time-frequency units and standard binary classifiers are used. As one of the most important characteristics of speech signal is its temporal dynamics, the IBM contains highly structured, instead of, random patterns. In this study, we incorporate temporal dynamics into classification by employing structured output learning. In particular, we use linear-chain structured perceptrons to account for the interactions of neighboring labels in time. However, the performance of structured perceptrons largely depends on the linear separability of features. To address this problem, we employ pretrained deep neural networks to automatically learn effective feature functions for structured perceptrons. The experiments show that the proposed system significantly outperforms previous IBM estimation systems.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملBoosting Local Spectro-Temporal Features for Speech Analysis
We introduce the problem of phone classification in the context of speech recognition, and explore several sets of local spectro-temporal features that can be used for phone classification. In particular, we present some preliminary results for phone classification using two sets of features that are commonly used for object detection: Haar features and SVMclassified Histograms of Gradients (HoG).
متن کاملSuper-human multi-talker speech recognition: the IBM 2006 speech separation challenge system
We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high...
متن کاملSource separation using particle filters
Our goal is to study the statistical methods for source separation based on temporal and frequency specific features by using particle filtering. Particle filtering is an advanced state-space Bayesian estimation technique that supports non-Gaussian and nonlinear models along with time-varying noise, allowing for a more accurate model of the underlying system dynamics. We present a system that c...
متن کاملSingle Channel Speech Separation Using Factorial Dynamics
Human listeners have the extraordinary ability to hear and recognize speech even when more than one person is talking. Their machine counterparts have historically been unable to compete with this ability, until now. We present a modelbased system that performs on par with humans in the task of separating speech of two talkers from a single-channel recording. Remarkably, the system surpasses hu...
متن کامل